Types of validity in early theory yarmohammadi
' THREE '‘'TYPES'’' OF VALIDITY IN EARLY THEORY' In the early days of validity investigation, validity was broken down into three ‘types’ that were typically seen as distinct. Each type of validity was related to the kind of evidence that would count towards demonstrating that a test was valid. Cronbach and Meehl described these as: ■ Criterion-oriented validity Predictive validity Concurrent validity ■ Content validity ■ Construct validity Criterion-oriented validity When considering criterion-oriented validity, the tester is interested in the relationship between a particular test and a criterion to which we wish to make predictions. For example, I may wish to predict from scores on a test of second A'language academic reading ability whether individuals can cope with first-semester undergraduate business studies texts in an English-medium university.What we are really interested in here is the criterion, whatever it is that we wish to know about, but for which we don’t have any direct evidence. In the example above we cannot see whether future students can do the reading that will be expected of them before they actually arrive at the university and start their course. In this case the validity evidence is the strength of the predictive relationship between the test score and that performance on the criterion. Of course, it is necessary to decide what would count as ‘ability to cope with’ – as it is something that must be measurable. Defining precisely what we mean by such words and phrases is a central part of investigating validity. . ''Predictive validity ''is the term used when the test scores are used to predict some future criterion, such as academic success. If the scores are used to predict a criterion at the same time the test is given, we are studying ''concurrent '' ''validity. As we know that shorter tests mean that we collect less evidence about reading ability, one of the questions we would wish to ask is to what extent the shorter test is capable of predicting the scores on the longer test. In other words, could the shorter test replace the larger test and still be useful? This would be an example of a ''concurrent validity study ''that uses the longer test as the criterion. '''Content validity Content validity is defined as any attempt to show that the content of the test is a representative sample from the domain that is to be tested. In our example of the academic reading test it would be necessary to show that the texts selected for the test are typical of the types of texts that would be used in first-year undergraduate business courses. This is usually done using expert judges. These may be subject teachers, or language teachers who have many years’ experience in teaching business English. The judges are asked to look at texts that have been selected for inclusion on the test and evaluate them for their representativeness within the content area. Secondly, the items used on the test should result in responses to the text from which we can make inferences about the test takers’ ability to process the texts in ways expected of students on their academic courses. For example, we may discover that business students are primarily required to read texts to extract key factual information, take notes and use the notes in writing assignments. In our reading test we would then try to develop items that tap the ability to identify key facts. ★''' Construct validity''' The first problem with construct validity is defining what a ‘construct’ is. Perhaps the easiest way to understand the term ‘construct’ is to think of the many abstract nouns that we use on a daily basis, but for which it would be extremely hard to point to an example. For a general term to become a construct, it must have two further properties. Firstly, it must be defined in such a way that it becomes measurable. In order to measure ‘fluency’ we have to state what we could possibly observe in speech to make a decision about whether a speaker is fluent. It turns out that many people have different definitions of fluency, ranging from simple speed of speech, to lack of hesitation (or strictly ‘pauses’, because ‘hesitation’ is a construct itself), to specific observable features of speech. Secondly, any construct should be defined in such a way that it can have relationships with other constructs that are different. For example, if I generate descriptions of ‘fluency’ and ‘anxiety’ I may hypothesize that, as anxiety increases, fluency will decrease, and vice versa. If this hypothesis is tested and can be supported, we have the very primitive beginnings of a theory of speaking that relates how we perform to emotional states. To put this another way, concepts become constructs when they are so defined that they can become ‘operational’ – we can measure them in a test of some kind by linking the term to something observable (whether this is ticking a box or performing some communicative action), and we can establish the place of a construct in a theory that relates one construct to another . Construct validity and truth In the early history of validity theory there was an assumption that there is such a thing as a ‘psychologically real construct’ that has an independent existence in the test taker, and that the test scores represent the degree of presence or absence of this very real property. As Cronbach and Meehl .put it: Construct validation takes place when an investigator believes that his instrument reflects a particular construct, to which are attached certain meanings. The proposed interpretation generates specific testable hypotheses, which are a means of confirming or disconfirming the claim. This brings us to our first philosophical observation. It has frequently been argued that early validity theorists were positivistic in their outlook. That is, they assumed that their constructs actually existed in the heads of the test takers.Again,Cronbach and Meehl state: ‘Scientifically speaking, to “make clear what something is” means to set forth the laws in which it occurs.We shall refer to the interlocking system of laws which constitute a theory as a nomological network.’ The idea of a nomological network is not difficult to grasp. Firstly, it contains a number of constructs, and their names are abstract, like those in the list above. In language teaching and testing, ‘fluency’ and ‘accuracy’ are two well-known constructs. Secondly, the nomological network contains the observable variables – those things that we can see and measure directly, whereas we cannot see ‘fluency’ and ‘accuracy’ directly. Whatever we choose makes up the definition of the constructs. For fluency we may wish to observe speed of delivery or the number of unfilled pauses, for example. For accuracy, we could look at the ratio of correct to incorrect tense use or word order. From what we can observe,we then make an inference about how ‘fluent’ or how ‘accurate’ a student’s use of the second language is. The network is created by asking what we expect the relationship between ‘fluency’ and ‘accuracy’ to be. One hypothesis could be that in speech, as fluency increases, accuracy decreases, because learners cannot pay attention to form when the demands of processing take up all the capacity of short-term memory. Another hypothesis could be that, as accuracy increases, the learner becomes more fluent, because language form has become automatic. Stating this kind of relationship between constructs therefore constitutes a theory, and theory is very powerful.